Motivation

  • New York City is known for its concrete jungle, but there is much more to explore regarding green spaces.
  • Throughout history, many disinvested areas found gardens as a sense of community following the 1970s.
  • Very little has been explored with community gardens and other factors such as economic and health conditions. Our goal was to further understanding these relationships.

Questions

Our initial question was to explore how gardens were distributed throughout the city and how these gardens affect the people in each community. As we moved through the project, we solidified specific questions and this website now answers four main questions. We wanted to group specific related variables by theme, hoping to make the information easier to digest for the reader. For example, information about poverty, budgets and rent burden is all grouped in the economic investment page.

  1. How are community gardens distributed throughout the city?
  2. How are the locations of gardens related to certain demographic characteristics?
  3. What is the relationship between community gardens and economic investment?
  4. What is the relationship between community gardens and select health conditions?

Data

Data Sources

Data Cleaning

Each dataset that was obtained was first individually cleaned. Most were already well organized, and unnecessary variables were removed for efficiency and clarity. Average variables were calculated in these steps as well.

# Cleaning and tidying primary NYC garden data
gardens = 
  read_csv("./data/nyc_community_gardens.csv") %>% 
  janitor::clean_names() %>% 
  mutate(
    community_board = replace(community_board, community_board == "N/A", NA)
  ) %>% 
  drop_na(community_board) %>% 
  relocate(community_board) %>% 
  select(-prop_id,-census_tract, -bin, -bbl, -nta, -council_district, -cross_streets, -jurisdiction, -postcode)

# Cleaning and tidying demographic data
demo = 
  read_xlsx("./data/health_profiles.xlsx",
  sheet = "CHP_all_data", skip = 1) %>% 
  janitor::clean_names() %>% 
  slice(-(1:6)) %>% 
  slice_head(n = 59) %>% 
  mutate(
    id_spatial = id, 
    id = str_replace(id, "^1", "M"),
    id = str_replace(id, "^2", "X"),
    id = str_replace(id, "^3", "B"),
    id = str_replace(id, "^4", "Q"),
    id = str_replace(id, "^5", "R"),
    gentrification = replace(gentrification, gentrification == "n/a", NA),
    avertable_death = replace(avertable_death, avertable_death == "n/a", NA),
    avertable_death = replace(avertable_death, avertable_death == "^", NA)
  ) %>% 
  rename(community_board = id, not_complete_hs = edu_did_not_complete_hs, hs_some_college = edu_hs_grad_some_college, college_higher = edu_college_degree_and_higher) %>% 
  select(id_spatial, community_board:age65plus, on_time_hs_grad, not_complete_hs, hs_some_college, college_higher, poverty, rent_burden, obesity, hypertension, life_expectancy, self_rep_health)

# Cleaning and tidying property data
property = 
  read_csv("./data/property_values.csv") %>% 
  janitor::clean_names() %>% drop_na(community_board) %>% 
  rename(old_community_board = community_board) %>% 
  mutate(
    boro_letter = case_when(
      borough == "MANHATTAN" ~ "M",
      borough == "BRONX" ~ "X",
      borough == "BROOKLYN" ~ "B",
      borough == "QUEENS" ~ "Q",
      borough == "STATEN IS" ~ "R"
    ),
    board_num = ifelse(nchar(old_community_board) == 1, paste0(0, old_community_board), old_community_board)
  ) %>% 
  unite("community_board", boro_letter, board_num, sep = "") %>% 
  relocate(community_board, old_community_board, borough) %>% 
  select(community_board, original_market_value, revised_market_value)

# Grouping property data by community board to prepare for merging.
property_tidy = 
  property %>% 
  group_by(community_board) %>% 
  summarize(
    avg_org_value = mean(original_market_value, na.rm = TRUE),
    avg_rev_value = mean(revised_market_value, na.rm = TRUE)
  )

# Cleaning and tidying budget data
budget = 
  read_csv("./data/budget_tracker.csv") %>% 
  janitor::clean_names() %>% drop_na(community_board) %>% 
  rename(old_community_board = community_board) %>% 
  mutate(
    boro_letter = case_when(
      borough == "MANHATTAN" ~ "M",
      borough == "BRONX" ~ "X",
      borough == "BROOKLYN" ~ "B",
      borough == "QUEENS" ~ "Q",
      borough == "STATEN IS" ~ "R"
    ),
    board_num = ifelse(nchar(old_community_board) == 1, paste0(0, old_community_board), old_community_board)
  ) %>% 
  unite("community_board", boro_letter, board_num, sep = "") %>% 
  relocate(community_board, old_community_board, borough) %>% 
  select(community_board, vote_year, total_appropriated)

# Grouping budget data by community board to prepare for merging and calculating the average budgets for each community board.
budget_tidy = 
  budget %>% 
  group_by(community_board) %>% 
  summarize(
    avg_tot_appropriated = mean(total_appropriated, na.rm = TRUE)
  )

Afterwards, we merged all datasets into one tidied dataset.

final_tidy1 =
  left_join(demo, gardens, by = "community_board") %>% 
  left_join(., property_tidy, by = "community_board") %>% 
  left_join(., budget_tidy, by = "community_board")

# Calculating number of gardens in each community board
num_gardens <-
  final_tidy1 %>% 
  drop_na(garden_name) %>% 
  group_by(community_board) %>% 
  summarize(garden_num = n())

unique_comboard <-
  final_tidy1 %>% 
  distinct(community_board, .keep_all = TRUE)

# Complete merging
final_tidy <-
  left_join(unique_comboard, num_gardens, by = "community_board") %>% 
  replace_na(list(garden_num = 0)) 

final_extra <-
  left_join(final_tidy1, num_gardens, by = "community_board") %>% 
  replace_na(list(garden_num = 0))

write_csv(final_extra, "./data/final_extra.csv")
write_csv(final_tidy, "./data/final_df.csv")
write_csv(final_tidy1, "./data/final_df_main.csv")

The final tidied dataset contains 59 observations (the number of community boards in NYC) and consists of the following 36 variables:

  • id_spatial. Unique spatial ID for each board.
  • community_board. Community board location of each garden.
  • borough. Borough location of each community board.
  • name. Neighborhood name location of each community board.
  • overall_pop. Overall population number in each community board.
  • race_white. Percent self-reporting non-hispanic white.
  • race_black. Percent self-reporting non-hispanic black.
  • race_asian. Percent self-reporting non-hispanic asian.
  • race_latino. Percent self-reporting hispanic/latino.
  • race_other. Percent self-reporting other (American Indian, Native Hawaiian, Pacific Islander, two or more races).
  • age0to17. Percent age between 0 and 17 years.
  • age18to24. Percent age between 18 and 24 years.
  • age25to44. Percent age between 25 and 44 years.
  • age45to64. Percent age between 45 and 64 years.
  • age65plus. Percent age 65 years and older.
  • on_time_hs_grad. Percentage of public high school freshman from the 2013-2014 school year who graduated in 4 years
  • not_complete_hs. Percentage of adults ages 25 and older whose highest level of education is less than a high school diploma or GED.
  • hs_some_college. Percentage of adults ages 25 and older who have a high school diploma or a high school diploma and some college.
  • college_higher. Percentage of adults ages 25 and older who obtained an educational degree above a High School diploma (Associate’s, Bachelor’s, or Graduate or professional degree).
  • poverty. Percentage of residents living below 100% of New York City’s calculated poverty threshold based on income and necessary expenses.
  • rent_burden. Percentage of renter-occupied homes whose gross rent (contract rent plus estimated average monthly cost of utilities including electricity, gas, and water and sewer) is equal to or greater than 30 percent of household income in past 12 months.
  • obesity. Percentage of adults ages 18 and older who have obesity (Body Mass Index of 30 or greater) based on self-reported height and weight.
  • hypertension. Percentage of adults ages 18 and older who report ever being told by a healthcare professional that they have hypertension, also known as high blood pressure.
  • life_expectancy. Life expectancy at birth.
  • self_rep_health. Percentage of adults ages 18 and older who report their overall health is “excellent,” “very good” or “good” on a scale of excellent, very good, good, fair or poor.
  • boro. Borough location of each community board.
  • garden_name. Name of garden.
  • address. Address of garden.
  • size. Size of garden in acres.
  • neighborhood_name. Neighborhood name location of each community board
  • latitude. Latitude location of garden.
  • longitude. Longitude location of garden.
  • avg_org_value. Average original market value.
  • avg_rev_value. Average revised market value.
  • avg_tot_appropriated. Average budget allocated to each community board.
  • garden_num. Number of gardens in each community board.

Exploratory Analysis

Our exploratory analysis is divided into 3 main parts. Garden distribution focuses on the geographic location of the gardens, the number of gardens in each community board, and also includes demographic information (age, education, race) for each board and borough. Health conditions provides a visual comparing number of gardens to percent obesity, hypertension, life expectancy, and self reported health. Economic investment shows a visual comparison between budget allocated to each board and market value, poverty percent and rent burden percent.

Garden Distribution

To explore the distribution of gardens in NYC, we first created a leaflet showing each garden as well as the outline of each community board.

# Read in tidied dataset.
garden =
  read_csv("./data/final_df.csv") %>%
  group_by(community_board) %>%
  select(id_spatial, garden_num, community_board, borough, overall_pop, race_white:college_higher, garden_name:longitude)

garden_all = 
  read_csv("./data/final_df_main.csv")

# Create leaf icons for leaflet map.
leafIcons <- icons(
  iconUrl = "http://leafletjs.com/examples/custom-icons/leaf-green.png",
  iconWidth = 10, iconHeight = 20
)

# Prepare for leaflet outline.
working <- getwd()
com_board_spdf <- readOGR(dsn = working, layer = "community_board_new", verbose = FALSE)
invisible(names(com_board_spdf))
data_spatial <- merge(com_board_spdf, garden, by.x = "boro_cd", by.y = "id_spatial")

# Add popup labels for map.
garden_all = 
  garden_all %>%
     mutate(
      click_label =
        str_c("<b>", garden_name, "</b><br>", borough, "</b><br>", size, " acres"))

# Prepare the text for labels:
mytext <- paste(
    "Borough: ", data_spatial@data$borough,"<br/>",
    "Community Board: ", data_spatial@data$community_board,"<br/>", 
    "Number of Gardens: ", data_spatial@data$garden_num, "<br/>", 
    sep = "") %>%
  lapply(htmltools::HTML)
 
# Final Map with shaded colors by number of gardens in the community district
leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  setView( lat = 40.7, lng = -74 , zoom = 10) %>%
  addPolygons(data = data_spatial,
    fillOpacity = 0, 
    smoothFactor = 0.5, 
    stroke = TRUE, 
    color = "black", 
    weight = 1,
    label = mytext,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>% 
    addMarkers(data = garden_all, ~longitude, ~latitude, icon = leafIcons, popup = ~click_label)

We also created a graph showing the distribution of gardens by number.

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   community_board = col_character(),
##   borough = col_character(),
##   name = col_character(),
##   boro = col_character(),
##   garden_name = col_character(),
##   address = col_character(),
##   neighborhood_name = col_character()
## )
## See spec(...) for full column specifications.
## `summarise()` ungrouping output (override with `.groups` argument)

We then created tables showing an overview of demographic information allows the reader to gain a better understanding of the landscape of New York City.

Age

# Creating age tables
age_tables =
  garden_all %>% 
  select(borough, community_board, age0to17:age65plus) %>% 
  distinct()

# Formatting column names
colnames(age_tables) = c("Borough", "Community Board", "0 to 17 years", "18 to 24 years", "25 to 44 years", "45 to 64 years", "65 plus")

# Interactive table including averages
reactable(age_tables, groupBy = "Borough",highlight = TRUE, searchable = TRUE, striped = FALSE, fullWidth = FALSE, showSortIcon = FALSE, 
          theme = reactableTheme(
    borderColor = "#89C281",
    highlightColor = "#C0DCBC"), 
    defaultColDef = colDef(
      align = "left",
      minWidth = 100),
    columns = list(
  'Community Board' = colDef(),
  "0 to 17 years" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "18 to 24 years" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "25 to 44 years" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "45 to 64 years" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "65 plus" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%"))))

Education

# Creating education table
edu_tables =
  garden_all %>% 
  select(borough, community_board, on_time_hs_grad:college_higher) %>% 
  distinct()

# Additional order formatting
edu_tables = 
  edu_tables %>% 
 relocate(not_complete_hs, .after = community_board)

# Creating column names
colnames(edu_tables) = c("Borough", "Community Board", "Less than High School", "High School", "Some College", "College or higher")

# Creating interactive table and averages
reactable(edu_tables, groupBy = "Borough", highlight = TRUE, searchable = TRUE, striped = FALSE, fullWidth = FALSE, showSortIcon = FALSE,
          theme = reactableTheme(
    borderColor = "#89C281",
    highlightColor = "#C0DCBC"), defaultColDef = colDef(
      align = "left",
      minWidth = 100),
    columns = list(
  'Community Board' = colDef(),
  "Less than High School" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "High School" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "Some College" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "College or higher" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%"))))

Race

# Creating race table
race_tables =
  garden_all %>% 
  select(borough, community_board, race_white:race_other) %>% 
  distinct()

# Formatting column names
colnames(race_tables) = c("Borough", "Community Board", "White", "Black", "Asian", "Latino", "Other")

# Creating interactive table and averages
reactable(race_tables, highlight = TRUE, searchable = TRUE, striped = FALSE, fullWidth = FALSE, showSortIcon = FALSE, 
          theme = reactableTheme(
    borderColor = "#89C281",
    highlightColor = "#C0DCBC"),
          groupBy = "Borough", defaultColDef = colDef(
      align = "left",
      minWidth = 100),
    columns = list(
  "Community Board" = colDef(),
  White = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  Black = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  Asian = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  Latino = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  Other = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%"))))

The map illustrates that there are many gardens distributed around the city, with more gardens in Brooklyn, Queens and the Bronx. Additionally, we can see that the community boards are sectioned in fairly similar sizes.

The age table shows that the distribution in age across the five boroughs are similar, with the majority of people being 25 to 44 years old. However, there seems to be less children living in Manhattan when compared to the other boroughs. In terms of education, there is a larger difference. It appears that most people living in New York City have a high school education, with percentages tapering quickly moving towards college. Manhattan has the highest percentage of a college education or higher (65%), which is much higher than all other boroughs, more than doubling the percentage in the Bronx. Race also appears to be highly divided. While the majority of people in Staten Island are white, the majority of people in the Bronx are Latino. Manhattan looks to be most evenly distributed, showing the difference in racial makeup for each geographical location.

This information serves as a foundation for the other analyses to build off of.

Health Conditions

For health conditions, we selected a few diseases and other measures of health that have been associated with green spaces, being outdoors and have shorter term effects. As a result, we focused on obesity, hypertension, life expectancy and self-reported health. In order to allow for comparisons with number of gardens, we created choropleths for each health condition to allow for side by side comparisons.

# Creating subset of tidied data for health exploration
health_data_final <-
  read_csv("./data/final_df.csv") %>% 
  select(community_board, borough, obesity, hypertension, life_expectancy, self_rep_health, garden_num, id_spatial) 
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   community_board = col_character(),
##   borough = col_character(),
##   name = col_character(),
##   boro = col_character(),
##   garden_name = col_character(),
##   address = col_character(),
##   neighborhood_name = col_character()
## )
## See spec(...) for full column specifications.
working <- getwd()
com_board_spdf <- readOGR(dsn = working, layer = "community_board_new", verbose = FALSE)
invisible(names(com_board_spdf))
health_data_spatial <- merge(com_board_spdf, health_data_final, by.x = "boro_cd", by.y = "id_spatial")
# Results from spatial regression or cluster identification to assess the relationship between these variables and community gardens

# Create a color palette for the obesity map:
pal2 <- colorNumeric("BuPu", health_data_spatial@data$obesity)

# Prepare the text for obesity labels:
mytext2 <- paste(
  "Borough: ", health_data_spatial@data$borough,"<br/>",
    "Community Board: ", health_data_spatial@data$community_board,"<br/>", 
    "Percent Obesity: ", health_data_spatial@data$obesity, "<br/>", 
    "Number of Gardens: ", health_data_spatial@data$garden_num, "<br/>",
    sep = "") %>%
  lapply(htmltools::HTML)
 
# Final Map with shaded colors by percent obesity in the community district
leaflet(health_data_spatial) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  setView( lat = 40.7, lng = -74 , zoom = 10) %>%
  addPolygons( 
    fillOpacity = 0.8, 
    smoothFactor = 0.5, 
    fillColor = ~colorQuantile("BuPu", obesity)(obesity), 
    stroke = TRUE, 
    color = "black", 
    weight = 0.3,
    label = mytext2,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  addLegend( pal = pal2, values = ~obesity, opacity = 0.9, title = "Percent Obesity", position = "topleft" )
# Create a color palette for the hypertension map:
pal3 <- colorNumeric("BuPu", health_data_spatial@data$hypertension)

# Prepare the text for hypertension labels:
mytext3 <- paste(
    "Borough: ", health_data_spatial@data$borough,"<br/>",
    "Community Board: ", health_data_spatial@data$community_board,"<br/>", 
    "Percent Hypertension: ", health_data_spatial@data$hypertension, "<br/>", 
    "Number of Gardens: ", health_data_spatial@data$garden_num, "<br/>",
    sep = "") %>%
  lapply(htmltools::HTML)
 
# Final Map with shaded colors by percent hypertension in the community district
leaflet(health_data_spatial) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  setView( lat = 40.7, lng = -74 , zoom = 10) %>%
  addPolygons( 
    fillOpacity = 0.8, 
    smoothFactor = 0.5, 
    fillColor = ~colorQuantile("BuPu", hypertension)(hypertension), 
    stroke = TRUE, 
    color = "black", 
    weight = 0.3,
    label = mytext3,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  addLegend( pal = pal3, values = ~hypertension, opacity = 0.9, title = "Percent Hypertension", position = "topleft" )
# Create a color palette for the life expectancy map:
pal4 <- colorNumeric("BuPu", health_data_spatial@data$life_expectancy)

# Prepare the text for life expectancy labels:
mytext4 <- paste(
    "Borough: ", health_data_spatial@data$borough,"<br/>",
    "Community Board: ", health_data_spatial@data$community_board,"<br/>", 
    "Life Expectancy: ", health_data_spatial@data$life_expectancy, "<br/>", 
    "Number of Gardens: ", health_data_spatial@data$garden_num, "<br/>",
    sep = "") %>%
  lapply(htmltools::HTML)
 
# Final Map with shaded colors by life expectancy in the community district
leaflet(health_data_spatial) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  setView( lat = 40.7, lng = -74 , zoom = 10) %>%
  addPolygons( 
    fillOpacity = 0.8, 
    smoothFactor = 0.5, 
    fillColor = ~colorQuantile("BuPu", life_expectancy)(life_expectancy), 
    stroke = TRUE, 
    color = "black", 
    weight = 0.3,
    label = mytext4,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  addLegend( pal = pal4, values = ~life_expectancy, opacity = 0.9, title = "Life Expectancy", position = "topleft" )
# Create a color palette for the self reported health map:
pal5 <- colorNumeric("BuPu", health_data_spatial@data$self_rep_health)

# Prepare the text for self reported health labels:
mytext5 <- paste(
    "Borough: ", health_data_spatial@data$borough,"<br/>",
    "Community Board: ", health_data_spatial@data$community_board,"<br/>", 
    "Self Reported Health: ", health_data_spatial@data$self_rep_health, "<br/>",
    "Number of Gardens: ", health_data_spatial@data$garden_num, "<br/>",
    sep = "") %>%
  lapply(htmltools::HTML)
 
# Final Map with shaded colors by self reported health in the community district
leaflet(health_data_spatial) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  setView( lat = 40.7, lng = -74 , zoom = 10) %>%
  addPolygons( 
    fillOpacity = 0.8, 
    smoothFactor = 0.5, 
    fillColor = ~colorQuantile("BuPu", self_rep_health)(self_rep_health), 
    stroke = TRUE, 
    color = "black", 
    weight = 0.3,
    label = mytext5,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  addLegend( pal = pal5, values = ~self_rep_health, opacity = 0.9, title = "Self Reported Health", position = "topleft" )

We can see that obesity and hypertension are highly correlated, with a higher percentage of adults being obese and having hypertension in the Bronx and parts of Brooklyn. On the other hand, Manhattan has a much lower percentage of adults being obese and having hypertension. This relationship is further confirmed with life expectancy and self-reported health. Life expectancy at birth is highest for those in Manhattan and parts of Queens. Further, the highest percentage of adults rating their health as “excellent”, “very good” or “good” lies in Manhattan and parts of Brooklyn. Therefore, it appears that adults living in Manhattan are the “healthiest”.

Economic Conditions


Additional Analysis


Discussion